Search CORE

27 research outputs found

Future of networking is the future of Big Data, The

Author: Shannigrahi Susmit
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Summer.Includes bibliographical references.Scientific domains such as Climate Science, High Energy Particle Physics (HEP), Genomics, Biology, and many others are increasingly moving towards data-oriented workflows where each of these communities generates, stores and uses massive datasets that reach into terabytes and petabytes, and projected soon to reach exabytes. These communities are also increasingly moving towards a global collaborative model where scientists routinely exchange a significant amount of data. The sheer volume of data and associated complexities associated with maintaining, transferring, and using them, continue to push the limits of the current technologies in multiple dimensions - storage, analysis, networking, and security. This thesis tackles the networking aspect of big-data science. Networking is the glue that binds all the components of modern scientific workflows, and these communities are becoming increasingly dependent on high-speed, highly reliable networks. The network, as the common layer across big-science communities, provides an ideal place for implementing common services. Big-science applications also need to work closely with the network to ensure optimal usage of resources, intelligent routing of requests, and data. Finally, as more communities move towards data-intensive, connected workflows - adopting a service model where the network provides some of the common services reduces not only application complexity but also the necessity of duplicate implementations. Named Data Networking (NDN) is a new network architecture whose service model aligns better with the needs of these data-oriented applications. NDN's name based paradigm makes it easier to provide intelligent features at the network layer rather than at the application layer. This thesis shows that NDN can push several standard features to the network. This work is the first attempt to apply NDN in the context of large scientific data; in the process, this thesis touches upon scientific data naming, name discovery, real-world deployment of NDN for scientific data, feasibility studies, and the designs of in-network protocols for big-data science

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Supporting Climate Research using Named Data Networking

Author: Catherine Olschanowsky
Christos Papadopoulos
Susmit Shannigrahi
Publication venue
Publication date: 03/04/2020
Field of study

Abstract-Climate and other big data applications face substantial problems in terms of data storage, retrieval, sharing and management. While several community repositories and tools are available to help with climate data, these problems still persist and the community is actively looking for better solutions. In this project we apply NDN to support climate modeling applications. The information-centric nature of NDN, where content becomes a first class entity, simplifies many of the problems in this domain. NDN offers lightweight data publication, discovery and retrieval compared to IP-based solutions. However, introducing a new network architecture to a mature domain that routinely produces petabytes of datasets and a plethora of assorted tools to manipulate them, is a risky proposition. The advantages of NDN alone may not be sufficient to overcome the natural inertia. Our approach is to introduce NDN while carefully avoiding undue disruption to existing workflows. To that extent we employ a user interface that employs familiar filesystem operations to publish, discover and retrieve data, integrated with domain-specific translators that automatically convert and publish datasets as NDN objects. We outline the advantages of NDN in this application domain and the challenges we faced during the adaptation. We believe this is the first exercise in applying NDN in an existing, large, mature application domain

CiteSeerX

Managing scientific data with named data networking

Author: Dibenedetto Steve
Fan Chengyu
Newman Harvey
Olschanowsky Catherine
Papadopoulos Christos
Shannigrahi Susmit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Many scientific domains, such as climate science and High Energy Physics (HEP), have data management requirements that are not well supported by the IP network architecture. Named Data Networking (NDN) is a new network architecture whose service model is better aligned with the needs of data-oriented applications. NDN provides features such as best-location retrieval, caching, load sharing, and transparent failover that would otherwise be painstakingly (re-)implemented by each application using point-to-point semantics in an IP network. We present the first scientific data management application designed and implemented on top of NDN. We use this application to manage climate and HEP data over a dedicated, high-performance, testbed. Our application has two main components: a UI for dataset discovery queries and a federation of synchronized name catalogs. We show how NDN primitives can be used to implement common data management operations such as publishing, search, efficient retrieval, and publication access control

University of Memphis Digital Commons

Crossref

Caltech Authors

Named Data Networking based File Access for XRootD

Author: Balcas Justas
Fan Chengyu
Iordache Cǎtǎlin
Liu Ran
Newman Harvey
Shannigrahi Susmit
Wu Yuanhao
Yeh Edmund
Šrivinskas Raimondas
Publication venue: 'EDP Sciences'
Publication date: 16/11/2020
Field of study

We present the design and implementation of a Named Data Networking (NDN) based Open Storage System plug-in for XRootD. This is an important step towards integrating NDN, a leading future internet architecture, with the existing data management systems in CMS. This work outlines the first results of data transfer tests using internal as well as external 100 Gbps testbeds, and compares the NDN-based implementation with existing solutions

EDP Sciences OAI-PMH repository (1.2.0)

Caltech Authors

Hydra -- A Federated Data Repository over NDN

Author: Afanasyev Alex
Ai Xusheng
Brandel Tym
Feltus F. Alex
Patil Varun
Podder Proyash
Presley Justin
Shannigrahi Susmit
Wang Xi
Yu Tianyuan
Zhang Lixia
Publication venue
Publication date: 02/11/2022
Field of study

Today's big data science communities manage their data publication and replication at the application layer. These communities utilize myriad mechanisms to publish, discover, and retrieve datasets - the result is an ecosystem of either centralized, or otherwise a collection of ad-hoc data repositories. Publishing datasets to centralized repositories can be process-intensive, and those repositories do not accept all datasets. The ad-hoc repositories are difficult to find and utilize due to differences in data names, metadata standards, and access methods. To address the problem of scientific data publication and storage, we have designed Hydra, a secure, distributed, and decentralized data repository made of a loose federation of storage servers (nodes) provided by user communities. Hydra runs over Named Data Networking (NDN) and utilizes the State Vector Sync (SVS) protocol that lets individual nodes maintain a "global view" of the system. Hydra provides a scalable and resilient data retrieval service, with data distribution scalability achieved via NDN's built-in data anycast and in-network caching and resiliency against individual server failures through automated failure detection and maintaining a specific degree of replication. Hydra utilizes "Favor", a locally calculated numerical value to decide which nodes will replicate a file. Finally, Hydra utilizes data-centric security for data publication and node authentication. Hydra uses a Network Operation Center (NOC) to bootstrap trust in Hydra nodes and data publishers. The NOC distributes user and node certificates and performs the proof-of-possession challenges. This technical report serves as the reference for Hydra. It outlines the design decisions, the rationale behind them, the functional modules, and the protocol specifications

arXiv.org e-Print Archive

Named Data Networking in Climate Research and HEP Applications

Author: Barczyk Artur Jerzy
Liu Ran
Monga Inder
Mughal Azher
Newman Harvey
Papadopoulos Christos
Shannigrahi Susmit
Sim Alex
Vlimant Jean-Roch
Wu John
Yeh Edmund
Publication venue: 'AIP Publishing'
Publication date: 01/04/2015
Field of study

The Computing Models of the LHC experiments continue to evolve from the simple hierarchical MONARC[2] model towards more agile models where data is exchanged among many Tier2 and Tier3 sites, relying on both large scale file transfers with strategic data placement, and an increased use of remote access to object collections with caching through CMS's AAA, ATLAS' FAX and ALICE's AliEn projects, for example. The challenges presented by expanding needs for CPU, storage and network capacity as well as rapid handling of large datasets of file and object collections have pointed the way towards future more agile pervasive models that make best use of highly distributed heterogeneous resources. In this paper, we explore the use of Named Data Networking (NDN), a new Internet architecture focusing on content rather than the location of the data collections. As NDN has shown considerable promise in another data intensive field, Climate Science, we discuss the similarities and differences between the Climate and HEP use cases, along with specific issues HEP faces and will face during LHC Run2 and beyond, which NDN could address

Caltech Authors

Named Data Networking in Climate Research and HEP Applications

Author: Shannigrahi Susmit,
Publication venue
Publication date: 01/08/2023
Field of study

Ezid

Request aggregation, caching, and forwarding strategies for improving large climate data distribution with NDN: A case study

Author: Fan Chengyu
Papadopoulos Christos
Shannigrahi Susmit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/09/2017
Field of study

Scientific domains such as Climate Science, High Energy Particle Physics (HEP) and others, routinely generate and manage petabytes of data, projected to rise into exabytes [26]. The sheer volume and long life of the data stress IP network- ing and traditional content distribution networks mechanisms. Thus, each scientific domain typically designs, develops, im- plements, deploys and maintains its own data management and distribution system, often duplicating functionality. Sup- porting various incarnations of similar software is wasteful, prone to bugs, and results in an ecosystem of one-off solutions. In this paper, we present the first trace-driven study that investigates NDN in the context of a scientific application domain. Our contribution is threefold. First, we analyze a three-year climate data server log and characterize data access patterns to expose important variables such as cache size. Second, using an approximated topology derived from the log, we replay log requests in real-time over an NDN simulator to evaluate how NDN improves traffic flows through aggregation and caching. Finally, we implement a simple, nearest-replica NDN forwarding strategy and evaluate how NDN can improve scientific content delivery

University of Memphis Digital Commons

Scari: A strategic caching and reservation protocol for ICN

Author: Fan Chengyu
Papadopoulos Christos
Shannigrahi Susmit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/09/2018
Field of study

The point-to-point resource reservation solutions over IP networks are often end-to-end, and data flowing through these reserved tunnels are not reusable. As a result, the in-network resources are not optimally utilized. Information Centric Networking (ICN) has several properties that can more intelligently facilitate resource reservations. In this paper, we present Strategic Caching And Reservation in ICN (SCARI) for reserving resources on ICN networks. Preliminary simulation results indicate that SCARI can reduce bandwidth consumption and free up network resources by aggregating reservation requests and strategically caching content in the network

University of Memphis Digital Commons

Crossref